Goto

Collaborating Authors

 ancient text


Report on foundation model impacts released

AIHub

Partnership on AI has published a progress report on post-deployment governance practices pertaining to foundation models. The document, entitled " 2026 Transparency Report on Foundation Model Impacts ", measures the progress of 13 foundation model providers* in publicly documenting the impacts of their foundation models. In carrying out their analysis, authors Jacob Pratt and Albert Tanjaya reviewed more than 150 papers, articles, websites, and reports. For assessment, these four practices were broken down into 19 processes, or activities, that support how foundation model providers adopt practices. Although several leading organizations are defining what information to share and how, the rest are slow in adopting information-sharing practices.


AI for Science – from cosmology to chemistry

AIHub

On the 31st March, our editorial team headed to the Royal Society for AI for Science . This day-long conference explored how AI is changing the nature of scientific discovery, and was hosted by the Fundamental Research team from the Alan Turing Institute. Nestled in a terrace of 19th century townhouses along the banks of the Thames, the Royal Society looks as grand as the names who have passed through its doors throughout the years. Prof Jason McEwen, Chief Scientist for the Turing Institute, opened the event with an insightful talk on the nature of scientific revolution, and how the bidirectional relationship between AI and science could spark the next one. Then, Prof Anna Scaife from the University of Manchester spoke on the use of foundation models for astronomical discovery.


Maryna Viazovska's proofs of sphere packing formalized with AI

AIHub

The proofs that earned EPFL professor Maryna Viazovska the Fields Medal in 2022 have reached a new milestone: their complete formalization by computer, achieved through a collaboration between mathematicians and artificial intelligence tools. In 2016, Maryna Viazovska solved the sphere packing problem in dimension 8, proving that the E lattice constitutes the densest possible arrangement. Shortly after, together with collaborators, she established an analogous result in dimension 24 using the Leech lattice. Her method provided an elegant solution to a problem studied for centuries, with close ties to applied fields such as error-correcting codes. For this major contribution, Viazovska was awarded the Fields Medal in 2022, the highest distinction in mathematics.


InteChar: A Unified Oracle Bone Character List for Ancient Chinese Language Modeling

Diao, Xiaolei, Zhou, Zhihan, Shi, Lida, Wang, Ting, Qi, Ruihua, Xu, Hao, Shi, Daqian

arXiv.org Artificial Intelligence

Constructing historical language models (LMs) plays a crucial role in aiding archaeological provenance studies and understanding ancient cultures. However, existing resources present major challenges for training effective LMs on historical texts. First, the scarcity of historical language samples renders unsupervised learning approaches based on large text corpora highly inefficient, hindering effective pre-training. Moreover, due to the considerable temporal gap and complex evolution of ancient scripts, the absence of comprehensive character encoding schemes limits the digitization and computational processing of ancient texts, particularly in early Chinese writing. To address these challenges, we introduce InteChar, a unified and extensible character list that integrates unencoded oracle bone characters with traditional and modern Chinese. InteChar enables consistent digitization and representation of historical texts, providing a foundation for robust modeling of ancient scripts. To evaluate the effectiveness of InteChar, we construct the Oracle Corpus Set (OracleCS), an ancient Chinese corpus that combines expert-annotated samples with LLM-assisted data augmentation, centered on Chinese oracle bone inscriptions. Extensive experiments show that models trained with InteChar on OracleCS achieve substantial improvements across various historical language understanding tasks, confirming the effectiveness of our approach and establishing a solid foundation for future research in ancient Chinese NLP.


Intertextual Parallel Detection in Biblical Hebrew: A Transformer-Based Benchmark

Smiley, David M.

arXiv.org Artificial Intelligence

Identifying parallel passages in biblical Hebrew (BH) is central to biblical scholarship for understanding intertextual relationships. Traditional methods rely on manual comparison, a labor-intensive process prone to human error. This study evaluates the potential of pre-trained transformer-based language models, including E5, AlephBERT, MPNet, and LaBSE, for detecting textual parallels in the Hebrew Bible. Focusing on known parallels between Samuel/Kings and Chronicles, I assessed each model's capability to generate word embeddings distinguishing parallel from non-parallel passages. Using cosine similarity and Wasserstein Distance measures, I found that E5 and AlephBERT show promise; E5 excels in parallel detection, while AlephBERT demonstrates stronger non-parallel differentiation. These findings indicate that pre-trained models can enhance the efficiency and accuracy of detecting intertextual parallels in ancient texts, suggesting broader applications for ancient language studies.


Efficiently Building a Domain-Specific Large Language Model from Scratch: A Case Study of a Classical Chinese Large Language Model

Li, Shen, Hu, Renfen, Wang, Lijun

arXiv.org Artificial Intelligence

General-purpose large language models demonstrate notable capabilities in language comprehension and generation, achieving results that are comparable to, or even surpass, human performance in many natural language processing tasks. Nevertheless, when general models are applied to some specific domains, e.g., Classical Chinese texts, their effectiveness is often unsatisfactory, and fine-tuning open-source foundational models similarly struggles to adequately incorporate domain-specific knowledge. To address this challenge, this study developed a large language model, AI Taiyan, specifically designed for understanding and generating Classical Chinese. Experiments show that with a reasonable model design, data processing, foundational training, and fine-tuning, satisfactory results can be achieved with only 1.8 billion parameters. In key tasks related to language processing of Classical Chinese such as punctuation, identification of allusions, explanation of word meanings, and translation between ancient and modern Chinese, this model exhibits a clear advantage over both general-purpose large models and domain-specific traditional models, achieving levels close to or surpassing human baselines. This research provides a reference for the efficient construction of specialized domain-specific large language models. Furthermore, the paper discusses the application of this model in fields such as the collation of ancient texts, dictionary editing, and language research, combined with case studies.


Automating Violence Detection and Categorization from Ancient Texts

Abdelhalim, Alhassan, Regneri, Michaela

arXiv.org Artificial Intelligence

Violence descriptions in literature offer valuable insights for a wide range of research in the humanities. For historians, depictions of violence are of special interest for analyzing the societal dynamics surrounding large wars and individual conflicts of influential people. Harvesting data for violence research manually is laborious and time-consuming. This study is the first one to evaluate the effectiveness of large language models (LLMs) in identifying violence in ancient texts and categorizing it across multiple dimensions. Our experiments identify LLMs as a valuable tool to scale up the accurate analysis of historical texts and show the effect of fine-tuning and data augmentation, yielding an F1-score of up to 0.93 for violence detection and 0.86 for fine-grained violence categorization.


Productivity Is a Drag. Work Is Divine.

The Atlantic - Technology

Why should humans do anything, if machines can do it better? The answer is crucial to the future of human civilization--and may just lie in religious texts from centuries ago. From the digital (Google searches and Slack chats) to the purely mechanical (washing machines and microwaves), humans use tools nearly constantly to enhance or replace our own labor. Those that save time and effort are easy to appreciate--I have yet to meet someone who misses scrubbing clothes by hand. But the rapid rise of artificial intelligence--which can now write essays and poetry, create art, and substitute for human interaction--has scrambled the relationship between technology and labor.


Restoring Ancient Ideograph: A Multimodal Multitask Neural Network Approach

Duan, Siyu, Wang, Jun, Su, Qi

arXiv.org Artificial Intelligence

Cultural heritage serves as the enduring record of human thought and history. Despite significant efforts dedicated to the preservation of cultural relics, many ancient artefacts have been ravaged irreversibly by natural deterioration and human actions. Deep learning technology has emerged as a valuable tool for restoring various kinds of cultural heritages, including ancient text restoration. Previous research has approached ancient text restoration from either visual or textual perspectives, often overlooking the potential of synergizing multimodal information. This paper proposes a novel Multimodal Multitask Restoring Model (MMRM) to restore ancient texts, particularly emphasising the ideograph. This model combines context understanding with residual visual information from damaged ancient artefacts, enabling it to predict damaged characters and generate restored images simultaneously. We tested the MMRM model through experiments conducted on both simulated datasets and authentic ancient inscriptions. The results show that the proposed method gives insightful restoration suggestions in both simulation experiments and real-world scenarios. To the best of our knowledge, this work represents the pioneering application of multimodal deep learning in ancient text restoration, which will contribute to the understanding of ancient society and culture in digital humanities fields.


Researchers use AI to decipher ancient Roman texts carbonized in deadly Mount Vesuvius eruption

FOX News

Ancient rock carvings have been uncovered near the Amazon River amid drought conditions in Brazil. A set of ancient texts burned by the volcanic eruption on Mount Vesuvius in 79 A.D. have been deciphered thanks to a team of researchers using AI. The nearly 2,000-year-old texts were unreadable after being charred in a villa in Herculaneum, a Roman town near Pompeii. The texts were discovered in an ancient villa in the town of Herculaneum. Believed to have been owned by the father-in-law of Julius Caesar, the texts were carbonized by the heat of the volcanic debris.